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Abstract 

The renewed interest in the foundations of quantum statistical mechanics in 
recent years has led us to study John von Neumann's 1929 article on the quantum 
ergodic theorem. We have found this almost forgotten article, which until now 
has been available only in German, to be a treasure chest, and to be much mis- 
understood. In it, von Neumann studied the long-time behavior of macroscopic 
quantum systems. While one of the two theorems announced in his title, the 
one he calls the "quantum //-theorem," is actually a much weaker statement than 
Boltzmann's classical if -theorem, the other theorem, which he calls the "quantum 
ergodic theorem," is a beautiful and very non-trivial result. It expresses a fact we 
call "normal typicality" and can be summarized as follows: For a "typical" finite 
family of commuting macroscopic observables, every initial wave function ipQ from 
a micro-canonical energy shell so evolves that for most times t in the long run, 
the joint probability distribution of these observables obtained from ij} t is close to 
their micro-canonical distribution. 
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1 Introduction 



In recent years, there has been renewed interest in the foundations of quantum statistical 
mechanics, see, e.g., [HI [521 [551 E3 H2]- Our own research in this direction has led us to 
questions which we later discovered had already been addressed, and some in fact solved, 
by John von Neumann in his 1929 article on the quantum ergodic theorem (QET) |71j . 
This article concerns the long-time behavior of macroscopic quantum systems, and in 
particular the approach to thermal equilibrium. We have found the article very useful, 
and think that it will also be of interest to a wider audience interested in the foundations 
of quantum statistical mechanics. Here we present an English translation of the 1929 
QET article by R. Tumulka, together with some commentary. In this commentary, we 
describe von Neumann's results in a non-technical (at least, less technical) way, elaborate 
on the aspects that we think need elucidation, and put the result into perspective by 
comparing it to current work on this topic. N.B. All results to date are still far from 
solving the mathematical problems concerning the quantitative approach to thermal 
equilibrium of realistic classical or quantum systems. Even less is known rigorously 
about properties of non-equilibrium systems, e.g., we are not able to derive the heat 
equation from either classical or quantum mechanics. 

Von Neumann's book on the "Mathematical Foundations of Quantum Mechanics" 
[73J, published in German in 1932 and in English in 1955, also contains some thermody- 
namic considerations in Chapter V. This chapter, however, has only little overlap with 
the QET article, whose content is mentioned only in two brief sentences. "The reader 
who is interested in this problem," von Neumann writes on page 416 of his book, "can 
refer to the treatments in the references" [i.e., to the QET article]. We actually found 
the QET article more illuminating than Chapter V of his book. 

The QET article is topical also in the following way. There is no consensus about 
the definition of thermal equilibrium for a quantum (or even a classical) system in 
microscopic terms; the main divide in the literature lies between a view that can be 
called the ensemblist view, according to which a system is in thermal equilibrium if it is 
in a mixed state (represented by an ensemble) that is close to the canonical (or micro- 
canonical) mixed state, and a view that can be called the individualist view, according 
to which a system in a pure state (or a point in phase space) can very well be in thermal 
equilibrium, depending on the state. The ensemblist view has traditionally prevailed, 
but the individualist view has gained ground recently (see, e.g., [121 EH ESI EH E21 ESI 
[571 1^2"! [23J). While von Neumann's ideas contain elements of both views, the QET 
is based mainly on the individualist view; indeed, he considered an isolated quantum 
system described by a pure state that evolves unitarily. We will elaborate on these two 
views in Section E] below. 

The QET article contains two theorems, mentioned already in its title: one von 
Neumann called the quantum ergodic theorem, the other the quantum ii-theorem (in 
analogy to Boltzmann's if -theorem in classical mechanics [8]). These two theorems are 
so closely related to each other in substance that one and the same proof establishes 
both of them. For this reason, and because the "quantum if-theorem" actually asserts 
much less than Boltzmann's if -theorem, we will discuss it only in Section [7J below and 
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focus otherwise on the QET. 

We will convey the content of the QET in Section [2J It expresses a precise version 
of a phenomenon we call "normal typicality" [22J: under conditions that are "typically" 
satisfied, every wave function ifj from a micro-canonical energy shell displays the same 
"normal" long-time behavior, viz., for most times t the "macroscopic appearance" of ipt 
is the same as that of the micro-canonical ensemble. Here, the macroscopic appearance 
is expressed in terms of von Neumann's concept of macroscopic observables, which was 
developed for the first time (as far as we are aware) in the QET article and which we 
will outline in Section 12.21 

The QET provides a condition under which "normal" long-time behavior occurs, 
and it also says that this condition is satisfied for most finite families of commuting 
macroscopic observables (or, in fact, for most Hamiltonians [22])- ^ i s thus perhaps 
the first typicality theorem in quantum mechanics. Typicality theorems, i.e., statements 
about most wave functions or most observables or most Hamiltonians, are now widely 
used. They were crucial to Wigner's work on random Hamiltonians in nuclear physics in 
the 1950s [71] and are currently used in a great variety of "random" systems. Typicality 
has also been used in recent years in the context of canonical typicality (i.e., the fact 
that, for most wave functions from a narrow energy shell of a large system, the reduced 
density matrix of a small subsystem is approximately canonical) [T§1 I2U E2J E3]- For 
other uses of typicality see, e.g., [251 [26| l5l| l62| l3| fT8] . 

When the QET article was published in 1929, Schrodinger wrote an enthusiastic 
letter to von Neumann [60]. Among other things, he wrote: 



Your statistical paper has been of extraordinary interest to me, I am very 
happy about it, and I'm particularly happy about the gorgeous clarity and 
sharpness of the concepts and about the careful bookkeeping of what has 
been achieved at every point J^] 

Schrodinger had previously published work [61] on thermodynamic properties of macro- 
scopic quantum systems that one would nowadays regard as a precursor of canonical 
typicality. A few years later, Pauli and Fierz [50J published an alternative proof of the 
QET which, however, yields weaker error bounds than von Neumann's proof. During 
the 1930s, the QET was also mentioned in expositions of the foundations of quantum 
statistical mechanics by Kemble [31] and Tolman [661 P- 472] (who misattributed it, 
though, to Pauli and Fierz). 

In the 1950s, two articles appeared expressing sharp criticisms of the QET, one 
by Farquhar and Landsberg [15] and one by Bocchieri and Loinger [5j. They claimed 
to have "mathematically proved the inadequacy of von Neumann's approach" [5] and 
that "the von Neumann approach is unsatisfactory" [15]. The authors repeated their 
criticisms in later publications [HI Ell El], calling the QET "essentially wrong" [34], 
"seriously flawed" [34], and "devoid of dynamical content" [HJ p. 166]. However, in 
these works the QET was mixed up with other statements that indeed are devoid of 
dynamical content, and the criticisms do not apply to the original QET. Unfortunately, 



1 Translated from the German by R. Tumulka. 
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this misunderstanding was not pointed out until recently [22]; in the 1950s and 1960s, 
the negative assessment of Farquhar, Landsberg, Bocchieri, and Loinger was widely 
cited and trusted (e.g., [131 SH SSI HZ1 E21 EI])- In 1962 Ludwig expressed the 
widespread view in this way [4"5"] : 

A short time after the development of quantum mechanics, J. von Neu- 
mann has given a proof of some kind of ergodic theorem. [...] After there 
was shown by papers of Landsberg and Farquhar and then definitively by 
a very clear paper of Bocchieri and Loinger that this proof is a physically 
meaningless one, it is superfluous to go deeper into this proof. 

As a consequence, the QET was undeservingly forgotten. We elaborate on the nature of 
the misunderstanding in [22] and in Section H] below. We note, however, that in the 1966 
review by Bocchieri and Prosperi [7J of the development of ergodic theory in quantum 
mechanics, no criticism of von Neumann based on this misunderstanding is made. 

The remainder of this paper is organized as follows. In Section [21 we give a qualitative 
summary of the QET. In Section [31 we compare the QET with the situation in classical 
mechanics and the concept of ergodicity. In Section [H we describe the nature of the 
widespread misunderstanding of the QET from the 1950s onwards. In Section [5] we 
review different definitions of thermal equilibrium and compare the QET to recent works 
on the approach to thermal equilibrium. In Section [61 we discuss the general relevance 
of typicality theorems. In Section [71 we discuss the contents and significance of von 
Neumann's "quantum if -theorem." Because the statement of the QET in the QET 
article is distributed over several places, we formulate it in Appendix [A] as a concise and 
precise mathematical theorem. In Appendix [Bl we provide a table with von Neumann's 
notation and elucidate some of his terminology. 



2 Qualitative Summary of the Quantum Ergodic The- 
orem 

2.1 Setting 

Von Neumann considered a macroscopic quantum system, confined to a finite volume 
of space. For the sake of concreteness, we suggest that readers think of a system of N 
interacting particles, where N is very large (usually larger than 10 20 ), in a box A C M 3 . 
The wave function ip t = ip^, . . . ,q N ,t) evolves according to the Schrodinger equation 

ih^ = HrjH (1) 

with H the Hamiltonian of the system. (Von Neumann used the opposite sign in the 
Schrodinger equation, writing — % instead of i. The form (JT|) is nowadays standard.) As 
usual, ipQ (and thus ip t ) should be normalized, 

N>o|| 2 = / \M 2 d\i---d\ N = l. (2) 

J A J A 
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It follows from the confinement to a finite volume that H has discrete energy levels, which 
we denote E a (see Appendix [B] for a list of von Neumann's notation). Let {4> a } be an 
orthonormal basis of the system's total Hilbert space J^otai consisting of eigenfunctions 
of H, 

H(j) a = E a (j) a . (3) 

Considering only Hamiltonians that are bounded from below, there will be only finitely 
many eigenvalues (with multiplicity) below any given value, so we can order them so 
that E < E x < E 2 < . . .. 

Von Neumann considered further (what amounts to) a partition of the energy axis, 
or rather of the relevant half-axis [E , oo), into disjoint intervals J? a = [<f a , <f a +i) (with 
(oq = E Q and t§ a < ^ a +i) that are large on the microscopic scale (so that each contains 
many eigenvalues E a ), but small on the macroscopic scale (so that different energies 
in one interval are not macroscopically different) H Such an interval is called a micro- 
canonical energy shell, an expression that is also often used to refer to the subspace 
<&s a ^totai spanned by the (fi a with E a e J? a . 



2.2 Macroscopic Observables 

We now turn to the mathematical structure that encodes the concept of "macroscopic" 
in von Neumann's article: a decomposition of the Hilbert space J$? into mutually or- 
thogonal subspaces J4? u , 

Jf = @J€, (4) 

V 

such that each corresponds to a different macro-state v. We call the ffl v the "macro- 
spaces" and write & for the family {J?t? v } of subspaces, called a "macro-observer" in von 
Neumann's paper, and P v for the projection to M'y. We use the notation 

d v = dim. Ml . (5) 

For the sake of simplicity, we focus on only one micro-canonical interval J? a = 
[S a , <e a +i) and simply consider Jfj^ a , rather than J^totai, as our Hilbert space Jif. In 
particular, we take (jl]) to be a decomposition of the energy shell J#j( a ; this decomposi- 
tion can be regarded as analogous to a partition of the energy shell in a classical phase 
space. Let D = dim J??, i.e., the number of energy levels, including multiplicities, be- 
tween S'a and S'a+i- This number is finite but huge — usually greater than 10 10 when 
the number N of particles is greater than 10 20 . 

The micro- canonical density matrix p mc is the projection to Jrf? times a normalization 
factor 1/D, 

Pmc = J) I0a)(&*l» ( 6 ) 

2 In von Neumann's words (Section 1.2): "With a certain (reduced) accuracy, [it is] possible [to 
measure energy with macroscopic means], so that the energy eigenvalues [...] can be collected in groups 
[...] in such a way that all [eigenvalues in the same group] are close to each other and only those [in 
different groups] can be macroscopically distinguished." 
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and the micro-canonical average of an observable A on J$? is given by 



tr(p mc A) 



tiA 



(7) 



D 



This value can also be obtained as the average of the values (<f) a \A\(f) a ) , with equal 
weights l/D, over those a with E a e [<f , <^ 0+ i). Alternatively, it can also be obtained 
as the average of the values (y?|yl|<£>) with (p uniformly distributed over the unit sphere 



For this uniform distribution, the probability that ip G B C §(j£ c; ) is the {2D — 1)- 
dimensional surface area of B times a normalization factor. 

Von Neumann motivated the decomposition 03]) by beginning with a family of op- 
erators corresponding to coarse-grained macroscopic observables and arguing that by 
"rounding" the operators, the family can be converted to a family of operators Mi, . . . , Mi 
that commute with each other, have pure point spectrum, and have huge degrees of 
degeneracy A macro-state can then be characterized by a list v = (mi,...,m<) of 
eigenvalues rrii of the Mj, and corresponds to the subspace Ml C J^otai containing the 
simultaneous eigenvectors of the Mj with eigenvalues m^; that is, J^, is the intersection 
of the respective eigenspaces of the Mj and d v is the degree of simultaneous degeneracy 
of the eigenvalues m 1; . . . , m^Jfl If any of the Jffl v has dimension 0, i.e., if a particular 
combination of eigenvalues of the Mj does not occur, then we delete it from the family 



As an example of the "rounding" of macroscopic observables, von Neumann points 
out that when we simultaneously measure the position and momentum of a macroscopic 
body, the experiment corresponds not to the exact center-of-mass position and total 
momentum observables but to two commuting observables approximating thesefl Cor- 
respondingly, the distance between neighboring eigenvalues of Mj represents the inaccu- 
racy of the measurement. As further examples of macro-observables, we may consider 
similar approximations to the number of particles in the left half of the box A divided 
by the total number of particles, or to the ^-component of the magnetization (i.e., the 
total magnetic z-moment 52t=i a z,ii where <7 Zj j is the third Pauli matrix acting on the 
i-th particle). 

Von Neumann's reasoning, that macroscopic observables can be taken to commute, 
has inspired research about whether for given operators A\, . . . , A^ whose commutators 
are small one can find approximations Mj w A^ that commute exactly; the answer 
is, for t > 3 and general A±, . . . ,Ai, no [10J. Also for two operators, the question is 
mathematically non-trivial; for recent results see [HI |57] , for an overview see . Von 

3 For a notion of macro-spaces that does not require that the corresponding macro-observables com- 
mute, see [IT] , in particular Section 2.1.1. 

4 In von Neumann's words (Section 0.2): "In a macroscopic measurement of coordinate and momen- 
tum (or two other quantities that cannot be measured simultaneously according to quantum mechanics) , 
really two physical quantities are measured simultaneously and exactly, which however are not exactly 
coordinate and momentum." 



§pr) = { v e jT : |M| = l}. 
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Neumann gave an example of such an approximation in his Section 0.2, starting from 
the position and momentum operators and of quantum mechanics and providing 
new operators Q' fc and P' k that commute; however, it is perhaps not a good example 
because, somewhat contrary to what he suggested in his Section 0.2, it is not clear why 
Qfc — Q' fc or Pfc — P' k should have small operator norm (or which other norm could be 
relevant) [E. Carlen and M. Hastings, personal communication] Jf] 

When von Neumann considered the commuting macro-observables Mi, . . . ,Mi, he 
had in mind that one of them, say Mi, is the "macroscopic energy," which can be 
thought of as obtained from H by coarse-graining in agreement with the partition of 
the energy axis into the micro-canonical intervals J^, 

M 1 = Y,fl(Ea)\<f>a)(<f>«\ (9) 

with fx the appropriate step function given by 

h{E) = fa+^a+l for £6/a=[44+i)i (1Q) 

Since the Mi commute with one another, every Mi commutes with the coarse- grained 
energy M 1; but generally not with H, so it is generally not a conserved quantityjj 

Since all Mj commute, every Mj maps the energy shell M' to itself, and we can (and 
will) regard the Mj as operators on M' . Thus, our assumption is satisfied that each 
of the macro-spaces J^, either lies in or is orthogonal to ^ 1 . The size of d v is in 
practice also of the rough order 10 lt}2 °, though often very much smaller than D. (Note 
that, e.g., I0 a9999xl ° 20 is smaller than 10 lo2 ° by a factor of lO 10 ' 6 ). 



2.3 Statement of the Quantum Ergodic Theorem 

We now have the ingredients — H, Jjf, and 3) — to formulate the QET. Despite the name, 
the property described in the QET is not precisely analogous to the standard notion 
of ergodicity as used in classical mechanics and the mathematical theory of dynamical 
systems. That is why we prefer to call quantum systems with the relevant property 
"normal" rather than "ergodic. "0 Let us proceed towards a description of this property. 

5 Concerning his construction of Q' fe and P' k , we also note that the constant C (that controls the 
spread of the joint eigenfunctions ip n of Q' k and P' k in the position and momentum representations) was 
specified by von Neumann in Footnote 9 of his 1929 QET article as "C < 3.6" but in his 1932 book [73] 
as "C ~ 60," so maybe the bound C < 3.6 was incorrectly calculated. Bourgain [9] has improved the 
constant to C = 1; i.e., he has found another basis (p n that achieves with C = 1 (the bound suggested 
by the uncertainty principle) what von Neumann asserted of his choice of basis tp n described in his 
Footnote 10. 

6 In von Neumann's words (Section 1.2): "In general, [...] H is not a linear combination of the 
[projections to the joint eigenspaces of all Mi], since the energy is not a macroscopic quantity, as it 
cannot be measured with absolute precision with macroscopic means" [as the Mj can] . 

7 This terminology is inspired by the concept of a normal real number, which is "a real number whose 
digits in every base show a uniform distribution, with all digits being equally likely, all pairs of digits 
equally likely, all triplets of digits equally likely, etc.. While a general proof can be given that almost 
all numbers are normal, this proof is not constructive [...]. It is for instance widely believed that the 
numbers \^2, tt, and e are normal, but a proof remains elusive." [17] 
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Any wave function ip e Jif with ||-0|| = 1 defines a probability distribution over all 
macro-states v; namely, the probability associated with v is 

\\PM 2 = ^\PM) ■ (ii) 

(Recall that P v is the projection to J4%.) This is the probability of obtaining, in a 
joint measurement of the macro-observables Mi,...,M^ on a system in state if), the 
outcomes (mi, . . . , me) corresponding to v. Similarly, the micro-canonical density matrix 
p mc defines a probability distribution over all macro-states v; namely, the probability 
associated with v is 

tv( Pmc Pu) = . (12) 

Claim 1. For most wave functions if) from the unit sphere in the micro- canonical sub- 
space J^, the distribution ( ITT]) associated with if) is close to the micro- canonical distri- 
bution psji . 

The reference to "most" is intended to convey that the subset of the unit sphere in 
34? containing those if) for which ( ITT]) is close (in some precise sense) to ( TT2]) has measure 
arbitrarily close to 1, provided each of the d v is sufficiently large. Here, the "measure" 
corresponds to the uniform distribution over the unit sphere. Claim 1 follows from the 
fact, proven by von Neumann in his appendices A.1-A.3, that if 3^ v is any fixed subspace 
of 3^ of dimension d v and tp is a random vector with uniform distribution on the unit 
sphere in 34? then 

n\PM\ 2 = % , Var||P^f = e(||P^|| 2 - ^ < 1(^) 2 . (13) 

Here, E denotes the expected value and Var the variance of a random variable. Thus, the 
first equation in (fT3"l) says that the value (ITT)) associated with if), when averaged over the 
unit sphere, yields the micro-canonical value (|12p . and the second equation says that 
the standard deviation of the random variable ||P,93|| 2 is small, in fact much smaller 
than its average, provided d v ^> 1. It then follows from Chebyshev's inequality that the 
probability that HP^H 2 deviates much from its expectation value d v /D is small. That 
is, in the language of measure theory, the set of "0s for which HP,^! 2 deviates much 
from the micro-canonical value is small, which was what was claimed. 

As a consequence of Claim 1, most wave functions ip are such that for each of the 
macroscopic observables Mi, . . . , Me — and, in fact, for every function /(Mi, . . . , Me), 
i.e., for every element of the algebra generated by Mi, . . . , Me — the probability distri- 
bution that if) defines on the spectrum of the observable is close to the one defined by 
the micro-canonical density matrix. Put loosely, most pure states in 3% \ when looked 
at macroscopically, look like the micro-canonical mixed state. It is clear that Claim 
1 cannot be true for all (rather than most) wave functions, as one can easily provide 
examples of wave functions whose distribution is not close to the micro-canonical one: 
say, if; G Jtff u for one particular v. 



8 



Let us consider now the time evolution of some initial ip and ask whether 

\\PM\ 2 ~jj for all z/ (14) 

will hold for most times This may seem like a plausible behavior in view of Claim 
1. In fact, from Claim 1 it follows rather easily that (fl4"|) holds for most initial wave 
functions ipo and most times t. The QET goes further. It asserts that, for certain 
systems, ( 114"]) holds for all initial wave functions ipo f° r most times t. This is important 
because one may expect most wave functions to represent microscopic states of thermal 
equilibrium, while states of non-equilibrium should form a very small minority. Thus, 
if we are interested in the evolution towards equilibrium, we are specifically interested 
in the question whether non-equilibrium states will evolve towards equilibrium, and 
hence we cannot be satisfied with statements about most wave functions because such 
statements need not apply to the non-equilibrium wave functions. 

Let us put this differently. We call a system, defined by H, J^, 3>, and ip £ J^, 
normal if and only if (I14p holds for most t. The QET provides conditions under which a 
system is normal for every initial state vector ipo. Furthermore, the QET asserts normal 
typicality, i.e., that typical macroscopic systems are normal for every ip Q ; more precisely, 
that for most choices of @, macroscopic systems are normal for every ip Q . The result is, 
in fact, equivalent to the statement that for most Hamiltonians, macroscopic systems 
are normal for every ipo [22] It thus provides reasons to believe (but no proof) that 
macroscopic systems in practice are normal. 

Before we give an informal statement of the QET, we formulate two conditions 
involved in it. First, following von Neumann, we say that a Hamiltonian H with eigen- 
values Ei, . . . , Ed has no resonances if and only if 

I either a = a' and f3 = f3' 
E a - E p ^ E a > - E p > unless < (15) 

I or a = p and a — p . 

In words, this means that the energy differences are non-degenerate. It implies in par- 
ticular that the energy levels themselves are non-degenerate, but is a stronger condition. 
The other condition is a technical one and can be stated as follows. For a given H and 
9 = {J%,}, let 

F V {H, ®) = max| {<f> a \P v \<j>p) | 2 + max((0 Q |P,|0 a ) - . (16) 

a^f} 1 1 a \ JJ / 

The condition, to which we will simply refer as "condition (TT7]) ." is that 

F U (H, $t) is sufficiently small for every v . (17) 



8 When saying "most t" we have in mind most t > 0, but the QET and our other statements are 
equally true for most t < 0, as long as the system was and remains isolated. 

9 The concept of "most refers to the uniform distribution over all orthogonal decompositions (j4]) 
such that dim — d v . When talking about "most Hamiltonians" we refer to the uniform distribution 
over all Hamiltonians with given eigenvalues. Both distributions are marginals of images of the Haar 
measure over the group of unitary D x D matrices; for their full definitions see [22] or the QET article. 
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Informal statement of the QET. (For a fully precise statement see Appendix HI be- 
low.) Let Jrff be any Hilbert space of finite dimension D, let 3> = {J4? u } be an orthogonal 
decomposition of with dimJ4? u = d u , and let the Hamiltonian H be a self-adjoint 
operator on Jif without resonances. If H and Qi satisfy condition (|T7|) then, for every 
wave function ipo G M 3 with \\ip\\ = 1, the system is normal, i.e., (|T4j) holds most of the 
time. Moreover, for sufficiently large d u s with ^ v d v = D, most families *2J = 
of mutually orthogonal subspaces J%? u with dim J4? u = d v are such that condition ( |17p is 
satisfied (and thus the system is normal for every ip ). 

(It is easy to understand the role of the second term on the right hand side of ( {TBI : 
When we want to ensure that the system is normal for every wave function, then we 
need to ensure that it is for energy eigenfunctions <p a . Since the time evolution of 4> a 
is stationary, it can only be normal if {4> a \Pv\4>a) ~ d u /D, which corresponds to the 
smallness of the second term in ( TT6|) .) 

Here is another way of expressing the QET. Let us denote the long-time average of 
a function f(t) by an overbar, 



/(f) = lim - / dtf(t). (18) 

T— ¥00 1 



(All statements remain valid if we include negative times and set 



1 



T^oo 2T 



T 



/(f) = lim — / dt /(*).) (19) 



T 



Then a system is normal if, for every u, the time average ||P,/0t|| 2 is close to d u /D and 
the time variance of 1 1 -FV^t 1 1 2 is small; equivalently, a system is normal if, for every u, 
the expression 

(\\PM\ 2 -d„/D) 2 (20) 

has small time average. The QET asserts that the time average of ( 120]) is no greater 
than ( TT6|) (independently of ^o); an d, moreover, that the average of (!T75|) over all Ql with 
dim Jrffv = d v is small when d v is sufficiently large. 

More detailed discussions of the QET have been provided by Pauli and Fierz in 1937 
and by Jancel in 1963 [30]; see also 



3 Comparison With Classical Mechanics 

For a classical Hamiltonian system, we denote a point in phase space by 

X = (q 1 ,...,q N ,p 1 ,...,p N ). (21) 

The time evolution of the micro-state X is given by the solution of the Hamiltonian 
equations of motion, which sends X (at time 0) to X t (at time f), f £ R. This dynamics 
preserves the Liouville phase-space volume. 
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Instead of the orthogonal decomposition of 3? into subspaces J^, we consider a 
partition of an energy shell T in phase space M. 6N , 

r = {X:^<if(X)<^ +1 }, (22) 

into regions Tj, corresponding to different macro-states v, 

r = |J r - ( 23 ) 

V 

i.e., if the micro-state X of the system is in T u then the macro-state of the system is 
v. Let n mc denote the micro-canonical distribution, i.e., the uniform distribution (= 
normalized Liouville phase space volume) on T. Then with each macro-state u there is 
associated the micro-canonical probability fi mc (T u ). 

A crucial difference between a quantum and a classical system is that ip can be a 
superposition of contributions from several J^s whereas X always lies in one and only 
one of the IV As a consequence, a single phase point X does not provide a nontrivial 
probability distribution over the us, and there is no statement analogous to ( I14p in 
classical mechanics. One can only ask about the fraction of time that X t spends in 
various T^s, and to this question we turn in the following subsection. 



3.1 Ergodicity 

As we mentioned already, normality — the property relevant to the QET — is not really 
analogous to ergodicity. Nevertheless, to formulate a quantum analog of ergodicity was 
von Neumann's motivation for the QET. 

Let us recall the concept of ergodicity (called "quasi-ergodicity" in the 1920s) in 
statistical mechanics. Let Tg denote the energy surface, 

Tg = {X G M. m : H(X) = £} , (24) 

and \Lg the (micro-canonical) invariant measure on Tg defined to be the limit of the 
normalized phase space volume measure /i mc as both S a — » <§ and (o a+ i — > <§\ in fact, 
fig is the surface area measure re-weighted with the inverse norm of the gradient of the 
Hamiltonian function and normalized. The dynamics generated by H on Yg is ergodic 
if it has no non-trivial (measurable) constants of the motion. As a consequence of 
Birkhoff's ergodic theorem |3], this is equivalent to the following: the fraction of time 
that the phase point X t spends in a region B C Tg is in the long run proportional to the 
size of the region, fig(B), for almost every Xq g Tg. ('Almost every" means that the 
set of exceptions has measure zero; this is different from "most," which conveys that the 
set of exceptions has small measure — but usually not zero.) Equivalently, time averages 
coincide with phase-space averages (over the energy surface). Let 5x t denote the delta 
measure concentrated at the phase point X t . Then ergodicity is equivalent to 

$x t = Us (25) 
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(with the time average understood in the sense of weak convergence) for almost every 
X e Tg. 

In quantum mechanics, if we regard a pure state \ipt) fatl as analogous to the pure 
state 5x t and p mc as analogous to pg, the statement analogous to (|25|) reads 



\A){^t\=Pmc (26) 



As pointed out by von Neumann in his QET article, the left hand side always exista 10 ! 
and can be computed as follows. If x/jq has coefficients c a = ((/>a\i>o) in the energy 
eigenbasis {4> a }, 

D 

?Po = ^2c a \(f) a ) , (27) 

Of=l 

then 

D 

^t = Y, e ~ lEat/hc ^), (28) 

o=l 

and thus 

p^^ e -i( Ba - WfiCaC .|^^j (29) 

Suppose that H is non-degenerate; then E a — Ep vanishes only for a = (3, so the time 
averaged exponential is 5 a p, and we have that 

=^|Ca| 2 |0a)(0«|- (30) 

a 

Thus, the case ff26|) occurs only for those special wave functions that have \c a \ 2 = 1/D 
for all a. That is, the property of a quantum system that is the most obvious analog of 
ergodicity is almost never satisfied. 

One can draw other analogies, though, by focusing just on the macroscopic ap- 
pearance, understood in terms of the macroscopic observables Mi, . . . , Mi mentioned in 
Section 12.11 above and the orthogonal decomposition Q> = {J^ u } they define. We say 
that two density matrices p and p' are macro scopically equivalent, in symbols 

pZf/, (31) 

if and only if 

ti(pP u ) w ti(p'P u ) (32) 
for all v. For example, 1*0) v0| ~ p m c if and only if 

\\PM*% (33) 



10 This existence statement also follows, at least for almost every ipo, from the (classical) ergodic 
theorems of Birkhoff [1] and von Neumann |72j ; however, the QET article appeared two years earlier. 
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for all v. This is exactly the condition considered in Claim 1 in Section 12. 3[ so this 
is true of most ip (provided d u ^> 1). Returning to the time average, we obtain that 



(0 

m/Wtl ~ Pmc if and only if 

d 







\ C a \ 2 (' 





P»\K) « ^ (34) 



for all v. Condition ( 134)) is satisfied for every i[) e S(J^) if and only if 

(<p a \Pu\<P«) « ^ (35) 

for every a and z/, a condition on H and ^ that follows from f lT7|) and thus is, accord- 
ing to the QET, typically obeyed (see also [7]). The analogy between \%l)t)(ipt\ ~ Pmc 
and ergodicity lies in the fact that the time average of a pure state in a sense agrees 
with the micro-canonical ensemble (with two differences: that the agreement is only an 
approximate agreement on the macroscopic level, and that it typically holds for every, 
rather than almost every, pure state). 

However, even more is true for many quantum systems: Not just the time average but 
even \ip t )(ip t \ itself is macroscopically equivalent to p mc for most times t, as expressed in 
( !T4|) . Thus, normality is in part stronger than ergodicity (it involves no time averaging) 
and in part weaker (it involves only macroscopic equivalence); in short, it is a different 
notion. In von Neumann's words (first paragraph of his Section 0.5): 

[T]he agreement between time and microscopic average should only be 
required for macroscopic quantities. This weakening comes together with an 
essential strengthening that is made possible only by using the macroscopic 
perspective. Namely, we will show that for every state of the system the 
value of each (macroscopically measurable) quantity not only has time mean 
equal to the micro-canonical mean, but furthermore has small spread, i.e., 
the times at which the value deviates considerably from the mean are very 
infrequent. 



3.2 Ergodic Components of the Schrodinger Evolution 

Every dynamical system whose dynamics leaves invariant a probability distribution fi 
can be partitioned into its ergodic components [63]. That is, its phase space T can be 
partitioned in a (more or less) unique way into disjoint subsets, T = U S T S and T s nT s i = 
for s s', so that each T s is invariant under the dynamics, and the dynamics is ergodic 
on T s (equipped with a probability measure /i s that it inherits from ji). 

In Section 0.4 of the QET article, von Neumann identifies the ergodic components 
of the Schrodinger dynamics, regarded as a dynamical system on the unit sphere of the 
Hilbert space ^totah at least when J^ ota i is finite dimensional and the eigenvalues of H 
are linearly independent over the rational numbers (which is the generic case). Here, 
the invariant distribution is the uniform distribution over the unit sphere, the parameter 
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s is a sequence (r a ) of radii, one for each energy level, and T s = T ruT2y __ is the torus 
defined by these radii, 

T ri , r2 ,... = {5> Q e^|0 Q > : < 9 a < 2tt} . (36) 

a 

4 Misunderstanding in the 1950s 

As noted before, the QET was widely dismissed after undeserved criticisms in [15j |5] 
arising from a wrong idea of what the QET asserts. In this section we point out the 
nature of the misunderstanding. Let p(@, ipo) be the statement that the system with 
initial wave function ipo is normal with respect to Q) (i.e., that (114p holds most of the 
time). The misunderstanding of Bocchieri and Loinger [5] consists of replacing the 
statement 

for most 9) : for all ip : p(@, ip ) , (37) 
which is part of the QET, with the inequivalent (in fact, weaker) statement 

for all ip : for most Q> : p(@, ip ) . (38) 

To see that these two statements are indeed inequivalent, let us illustrate the difference 
between "for most x: for all y: p(x,y) n and "for all y: for most x: p(x,y) n by two 
statements about a company: 

Most employees are never ill. (39) 

On each day, most employees are not ill. (40) 

When x ranges over employees, y over days, and p(x, y) is the statement "Employee x is 
not ill on day y" then "for most x: for all y: p(x,y)" is (139]) and "for all y: for most x: 
p(x, y)" is PP|) . It is easy to understand that (159^) implies (HP]), and fl4U|) does not imply 
( 13"9~j) . as there is the (very plausible) possibility that most employees are sometimes ill, 
but not on the same day. Von Neumann was clearly aware of the difference between 
( 137|) and (|38|) . as his footnote 37 in Section 3.1 shows: 

Note: what we have shown is not that for every given ip or A the ergodic 
theorem and the if -theorem hold for most U)\ tl/ja but that for most U\^ a 
they are universally valid, i.e., for all ip and A. The latter is, of course, much 
more [i.e., much stronger] than the former. 

Also Schrodinger, by the way, was aware that von Neumann had proven ( 137]) . as his 
1929 letter to von Neumann [60] shows: 

You can show: if this rotation [i.e., the unitary operator mapping an 
eigenbasis of if to a joint eigenbasis of M%, . . . ,M#\ is large enough then 
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the theorem holds with arbitrary accuracy. You can show further: the over- 
whelming majority of the conceivable rotations is indeed large enough — 
where the "overwhelming majority" is defined in an appropriate, rotation- 
invariant way. Given such a rotation, then the theorem holds for every psi\^\ 

To see how (I37p and ( 1381) are connected to the calculations in the QET article, as well 
as those of Bocchieri and Loinger [5], we note that, as mentioned earlier, the normality 
of ipo with respect to @ (i.e., the statement p(2t,ipo)) is equivalent to the statement 
that, for every u, 

(\\PM\ 2 -d,/D) 2 (41) 

is small. As a straightforward calculation shows (see [22J or the QET article), the 
quantity ( HIT) is, for all ipo, less than or equal to the non-negative quantity F U (H,3!) 
defined in ffl"6]) . which is independent of ipa- This calculation is von Neumann's argument 
showing that smallness of F v = F U (H, @) implies normality for every ip . The main work 
involved in proving the QET, though, is to show that F u is small for most @, and that 
is done by showing that the average of F v over all Q> is small. Bocchieri and Loinger [5] 
considered, instead of the two propositions that 

flUD < F v (42) 

and that 

the ^-average of F v is small, (43) 

the one proposition that 

the ^-average of (j4Tj) is small. (44) 

It can be proven easily that ( 14*41 is true for all i/jq, provided the d v are sufficiently large, 
by changing the order of the two operations of taking the time average and taking the 
^-average [51 [22]. However, this statement implies only (|38|) . and not the stronger 
statement ( )37|) needed for the QET. Indeed, (assuming the d v are sufficiently large) it 
follows that for all ipo it is true of most Q) and most t that HP^Vt II 2 ~ d v /D] this is ff3"8"j) . 
In contrast, the two propositions f l4"2"|) and (|4"3"|) yield that for most 3> it is true of all ip 
that, for most t, \\Puipt\\ 2 ~ d v /D\ this is flBTj) . 

The weaker statement (|38|) is indeed, as Bocchieri and Loinger criticized, dynamically 
vacuous, as it follows straightforwardly from a statement (true for large d v ) that does 
not involve the time evolution, viz., the statement that for every if>, 

the ^-average of (||P^|| 2 - d v /D) is small. (45) 

See [22] for a more detailed discussion. 

Farquhar and Landsberg (15] also mistook the QET for a different statement, in fact 
for one inequivalent to that considered by Bocchieri and Loinger. Their version differs 
from von Neumann's not just in the ordering of the quantifiers as in (|37p and (|38p . but 
also in that it concerns only the time average of UPt^H 2 ; whereas von Neumann's QET 
makes a statement about the value of HP^^II 2 f or m ost times. 

11 Translated from the German by R. Tumulka. 
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5 Approach to Thermal Equilibrium 



Von Neumann's QET, or the phenomenon of normal typicality, is closely connected 
with the approach to thermal equilibrium. As mentioned already, there is no consensus 
about what it means for a macroscopic system to be in "thermal equilibrium." Before 
comparing the QET to more recent results in Section 15.31 we outline in Section 15.11 
several different concepts of thermal equilibrium and in Section 15.21 several different 
concepts of approach to thermal equilibrium. 

5.1 Definitions of Thermal Equilibrium 

We begin with the concept of thermal equilibrium that seems to us to be the most 
fundamental. It can be shown in many cases, and is expected to be true generally, 
that for a physically reasonable choice of the macro-observables there will be among 
the macro-spaces J^, a particular macro-space one corresponding to thermal 

equilibrium, such that 

(46) 

In fact, the difference 1 — d eq /D is exponentially small in the number of particles. This 
implies, in particular, that each of the macro-observables Mi is "nearly constant" on the 
energy shell Jff in the sense that one of its eigenvalues has multiplicity at least d eq ~ D. 

We say that a system in the quantum state ip € S(Jf ) is in thermal equilibrium if 
and only if ip is very close (in the Hilbert space norm) to J^ q , or, equivalently, if and 
only if 

(ViPeqIV'}-!, (47) 

where P eq is the projection operator to Jif eq . 

The condition ( I47p implies that a quantum measurement of the macroscopic observ- 
able Mi on a system with wave function ip will yield, with probability close to 1, the 
"equilibrium" value of Mj. Likewise, a joint measurement of Mi, . . . , Mi will yield, with 
probability close to 1, their equilibrium values. It follows from fj46|) that most ip on the 
unit sphere in are in thermal equilibrium. Indeed, with (i(dip) the uniform measure 
on the unit sphere, 

y<v»ip« 1 iv>M#)=^«i- (48) 

Since the quantity (ip\P eq \ip) is bounded from above by 1, most ip must satisfy f|47l) . 

If a system is normal then it is in thermal equilibrium (as defined above) most of the 
time. (After all, being normal implies that ||P e q^l| 2 ~ d eq /D most of the time, which is 
close to 1. Of course, if the system is not in equilibrium initially, the waiting time until 
it first reaches equilibrium is not specified, and may be longer than the present age of 
the universe.) That is why we regard the case that one of the has the overwhelming 
majority of dimensions as important. Von Neumann, though, did not consider this 
case, and his QET actually has technical assumptions that are violated in this case. 
We have proved a theorem about normal typicality that applies to this case, and thus 
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complements von Neumann's QET, in [23]; it asserts that for most Hamiltonians with 
given non-degenerate eigenvalues (or, alternatively, for most @), all initial state vectors 
ipo evolve in such a way that tp t is in thermal equilibrium (according to the definition 
(j4"Tj) above) for most times to 

The above definition of thermal equilibrium in quantum mechanics is an example of 
what we called the "individualist" view; it is analogous to the following one in classical 
mechanics. Let T be the energy shell as in (I22p and {T u } a partition into regions 
corresponding to macro-states as described in Section It has been shown [35J for 
realistic systems with large N that one of the regions T u , corresponding to the macro- 
state of thermal equilibrium and denoted r eq , is such that 

'-< r -J = W Hl - (49) 

We say that a classical system with phase point X is in thermal equilibrium if X £ r cq . 
The analogy with the quantum mechanical definition (1471) arises from regarding both ijj 
and X as instances of individual pure states. 

We now turn to (what we called) the "ensemblist" view (for comparisons between the 
individualist and the ensemblist views see also |20l [39] ) . The ensemblist defines thermal 
equilibrium in classical mechanics by saying that a system is in thermal equilibrium if 
and only if it is described by a probability distribution p over phase space that is close 
to the appropriate distribution of thermal equilibrium (e.g., [66j [3TJ [321 [33j HH1 [59] ) , viz., 
either 

P ~ He , (50) 
where pg is the uniform distribution on the energy surface Tg = {X £ M. 6N : H(X) = 
<?}■. or 

P ~ /"can , (51) 

where p can is the canonical distribution, which has density (relative to the phase space 
volume measure) proportional to 

e-WO, (52) 

with (3 the inverse temperature. Correspondingly, for a quantum system the ensemblist 
would say that it is in thermal equilibrium if and only if it is described by a density 
matrix p that is close to the appropriate density matrix of thermal equilibrium (e.g., 
[661 EU [33]), viz., either 

P ~ Pmc (53) 

or 

P ~ Pcan , (54) 



12 An example of exceptional Hamiltonians that behave differently is provided by the phenomenon 
of Anderson localization (see in particular p] [48]): Certain physically relevant Hamiltonians possess 
some eigenfunctions 4> a that have a spatial energy density function that is macroscopically non- uniform 
whereas wave functions in should have macroscopically uniform energy density over the entire 
available volume. Thus, these eigenfunctions are examples of wave functions -0o evolving in such a way 
that ip t is never in thermal equilibrium. 
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where p can is the canonical density matrix 



Pcan 




= tr e 



-I3H 



(55) 



and H is the system's Hamiltonian. 

When considering a single classical system, the individualist insists that it has a 
phase point X, well-defined though usually unknown, but no distribution p (except 
the delta distribution Sx) because it is a single system; the ensemblist thinks that our 
knowledge of X, which is always incomplete, should be represented by a probability 
distribution p. Thus, the individualist regards thermal equilibrium as an objective 
event, the ensemblist as a subjective one — his notion of thermal equilibrium has an 
information-theoretic nature. 

Other definitions of thermal equilibrium are inspired by both the ensemblist and 
the individualist views. Von Neumann would have said, we think, that a system with 
wave function ip G S(J(?) is in thermal equilibrium if and only if ||P„^|| 2 « d u /D for all 
u, i.e., if the probability distribution over the us defined by ip coincides approximately 

with the micro-canonical distribution, or \ip){ip\ ~ Pmc- This definition has in common 
with the individualist definition (|47|) (and differs from the ensemblist definition ( |53|) 
or ( 1541) in) that it can be satisfied for a system in a pure state. The ensemblist spirit 
comes into play when considering whether the 'probability distribution of v is close to 
micro-canonical; while in classical mechanics, an individual phase point X defines only a 
delta distribution, which is far from micro-canonical, in quantum mechanics ip defines a 
distribution over v that is indeed approximately micro-canonical, even for an individual 
system. 

That von Neumann defined thermal equilibrium in this way explains why he did not 
consider the case that one of the d u s is close to D. Note also that, in terms of this 
definition, being normal immediately means being in thermal equilibrium most of the 
time, so that the QET is a statement about thermal equilibrium (although von Neumann 
never explicitly mentioned thermal equilibrium in his QET article). 

Here is another definition inspired by both views. Consider a bi-partite system 
consisting of subsystem 1 and subsystem 2, with Hilbert space Ji?vj2 = ^i ® and 
suppose it is in a pure state ip G M[\j2- One might say that subsystem 1 is in thermal 
equilibrium if and only if 



where p\ = tr 2 \4>)(4>\ is the reduced density matrix of subsystem 1, and p can is given 
by ( 15 5 p with some /3 and the Hamiltonian of subsystem 1 in the place of H . (Also, one 
might say that a system is in thermal equilibrium if every small subsystem has reduced 
density matrix that is approximately canonical.) While the ensemblist spirit is visible 
in the similarity between (|54|) and ( 156|) . this definition is more on the individualist side 
because the whole system is assumed to be in a pure state. 



Pi 



Pcan j 



(56) 
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5.2 Definitions of Approach to Thermal Equilibrium 



Corresponding to the different notions of what it means for a quantum system to be in 
thermal equilibrium, there are different notions of what it means to approach thermal 
equilibrium. Individualists (like us) consider an isolated system of finitely many particles 
in a pure state ip whose time evolution is unitary and say that the system approaches 
thermal equilibrium if and only if ip t , for some t > 0, belongs to the set of ips in thermal 
equilibrium (and remains in that set for a very long time). The ensemblist, one might 
imagine, would say that a system approaches thermal equilibrium if and only if its 
density matrix p t , for some t > 0, is close to the appropriate density matrix of thermal 
equilibrium (p mc or p can )- However, inspired by the situation in classical mechanics (see 
below), ensemblists tend to demand more and to say that a system approaches thermal 
equilibrium if and only if its density matrix p t converges, as t — > oo, to p mc or p can . 

This ensemblist notion of approach to thermal equilibrium is certainly mathemati- 
cally appealing. However, it is very hard for it to hold: Consider an isolated system of 
finitely many particles with unitary time evolution. Then neither a (non-equilibrium) 
mixed state p t of that system nor the reduced density matrix p x t = tr 2 pt of a subsystem 
will converge, as t — > oo, to p mc or p can . This is because of the recurrence properties of 
the unitary evolution: If the Hilbert space is finite-dimensional (which is the case if we 
consider only a finite energy interval such as \S ai S'a+i) for finitely many particles in a 
finite volume) then there are arbitrarily large t > such that the unitary time evolution 
operator 

U t = exp{-iHt/h) (57) 

is arbitrarily close to the identity operator. This fact is a consequence of the quasi- 
periodicity of the unitary evolution. Thus, the density matrix 

p t = U tPo U; (58) 

keeps on returning to near its initial state, and so does pi tt = tr 2 p*@ 

It also follows from recurrence that for an individualist, a system starting with a 
non-equilibrium state ipo cannot remain forever in thermal equilibrium after reaching 
it. A valid statement can assert at best that the system will spend most of the time 
in thermal equilibrium; that is, it will again and again undergo excursions away from 
thermal equilibrium, but in between spend overwhelmingly long periods in thermal 
equilibrium. In fact, our theorem in [23] asserts that when D is large enough and one 
d v = d eq has the vast majority of dimensions as in (1461) then, for most non-degenerate 
Hamiltonians, all initial pure states ipo will spend most of the time in thermal equilibrium 
in the sense of the definition (1471) . 

13 If Jt? is infinite-dimensional and H has discrete spectrum (as it would if the system is confined to 
a finite volume), then Ut may not be close to the identity for any t > 0, but still every density matrix 
Po keeps on returning to near its initial state. Indeed, po can be approximated by an operator p' Q of 
finite rank (i.e., a mixture of only finitely many pure states), which in turn can be approximated by 
an operator p'§ on a subspace spanned by finitely many energy eigenstates, and the quasi-periodicity 
implies that there are arbitrarily large t > with p" ss p'^ \ at such times, also pt ~ Po- 
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To avoid recurrence in quantum physics, the ensemblist considers infinite systems 
(which cannot be described by means of Hilbert spaces but by C* or W* algebras), for 
example a finite system of interest coupled to an infinitely large heat bath that is initially 
in thermal equilibrium (see, e.g., [681 HQ]). m fact, for such situations, the convergence 
pi t t —> Pcan has been proved rigorously [581 1211 [2]; see also [67] and references therein. 
Thus, for the ensemblist the approach to thermal equilibrium is an idealization and 
never occurs in the real world. 

Curiously, the ensemblist 's invocation of infinite systems is unnecessary in classical 
mechanics because the time evolution of a classical system is usually not quasi-periodic, 
and the recurrence of mixed states described above need not arise. The ensemblist's 
approach to thermal equilibrium is then connected to the property of being mixing |32j, 
a property related to (but stronger than) ergodicity: For any probability distribution p 
of X , let p t denote the distribution of X t . The dynamics on Tg is mixing if, for every 
absolutely continuous probability distribution p (i.e., one that has a density relative 
to pg), p t — > ps as t — > oo (in the sense of convergence on bounded functions). In 
contrast, the individualist expects that for realistic classical systems with a sufficiently 
large number N of constituents and for every macro-state u, most initial phase points 
I 6 T„ will be such that X t spends most of the time in the set r eq . This statement 
follows if the time evolution in phase space is ergodic, but in fact is much weaker than 
ergodicity. 

5.3 Current Research on the Approach to Thermal Equilib- 
rium 

Various results about the approach to equilibrium in the individualist framework have 
been obtained in recent years [T21 EH |65l [551 E3 H21 [23] . Many of these results can be 
described in a unified way as follows. Let us say that a system with initial wave function 
ip(0) equilibrates relative to a class of observables if for most times r, 

(ip T \A\if) T ) « tT(\ip t )(ipt\A\ for all A e srf . (59) 
We then say that the system thermalizes relative to srf if it equilibrates and, moreover, 

tr^t) (-0*1-4.) « tv(p mc A) for all A e stf . (60) 

That is, the system thermalizes relative to &f if, for most times t, 

(tfft\A\tfh) ~ tr(p mc A) for all A e srf . (61) 

With these definitions, the results of [651 ESI H21 E3], as well as von Neumann's QET, 
can be formulated by saying that, under suitable hypotheses on H and ip(0) and for 
large enough D, a system will equilibrate, or even thermalize, relative to a suitable class 
si ' . (It should in fact be true for a large class of observables A on that, for most ip 
on the unit sphere in J^ 7 , (ip\A\ip) ~ tr(p mc v4); if this is true of every member of srf then 
it is not hard to see that most initial wave functions will thermalize relative to sf.) 
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Von Neumann established in his QET, under assumptions we described in Section |2~3| 
thermalization for a family s$ of commuting observables; stf is the algebra generated by 
{Mi, . . . , Mi}. Rigol, Dunjko, and Olshanii [57] numerically simulated a model system 
and concluded that it thermalizes relative to a certain class srf consisting of commuting 
observables. Our result in [23] takes stf to contain just one operator, namely P eq . 
We established thermalization for arbitrary ^>(0) assuming H is non-degenerate and 
satisfies (0 a |-P eq |0a) ~ 1 for all a, which (we showed) is typically true. Tasaki [55] 
as well as Linden, Popescu, Short, and Winter [12] considered a system coupled to a 
heat bath, J^totai = ^sys ® ^bath, and took si to contain all operators of the form 
^4 S ys ® lbath- Tasaki considered a rather special class of Hamiltonians and established 
thermalization assuming that many eigenstates of H contribute to ipQ. Under a similar 
assumption on ip , Linden et al. established equilibration for H without resonances. 
They also established a result in the direction of thermalization under the additional 
hypothesis that the dimension of the energy shell of the bath is much greater than 
dim^ ys . Reimann's mathematical result [55] can be described in the above scheme as 
follows. Let &/ be the set of all observables A with (possibly degenerate) eigenvalues 
between and 1 such that the absolute difference between any two eigenvalues is at 
least (say) 10~ 1000 . He established equilibration for H without resonances, assuming 
that many eigenstates of H contribute to i[) . 

6 The Method of Appeal to Typicality 

We would like to clarify the status of statements about "most" or "typical" *3 (or, for 
that matter, most H or most ipo), and in so doing elaborate on von Neumann's method 
of appeal to typicality. In 1955, Fierz criticized this method as follows [16, p. 711]: 

The physical justification of the hypothesis [that all S's are equally prob- 
able] is of course questionable, as the assumption of equal probability for all 
observers is entirely without reason. Not every macroscopic observable in 
the sense of von Neumann will really be measurable. Moreover, the observer 
will try to measure exactly those quantities which appear characteristic of a 
given system. 

In the same vein, Pauli wrote in a private letter to Fierz in 1956 [19] : 

As far as assumption B [that all S>s are equally probable] is concerned 
[. . . ] I consider it now not only as lacking in plausibility, but nonsense. 

Concerning these objections, we first note that it is surely informative that normality 
holds for some ^s, let alone that it holds in fact for most £^s, with "most" understood 
in a mathematically natural way. But we believe that more should be said. 

When employing the method of appeal to typicality, one usually uses the language of 
probability theory. But that does not imply that any of the objects considered is random 
in reality. Rather, it means that certain sets (of wave functions, of orthonormal bases, 
etc.) have certain sizes (e.g., close to 1) in terms of certain natural (normalized) measures 
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of size. That is, one describes the behavior that is typical of wave functions, orthonormal 
bases, etc.. However, since the mathematics is equivalent to that of probability theory, 
it is convenient to adopt that language. For this reason, using a normalized measure \i 
does not mean making an "assumption of equal probability," even if one uses the word 
"probability." Rather, it means that, if a condition is true of most @, or most H, this 
fact may suggest that the condition is also true of a concrete given system, unless we 
have reasons to expect otherwise. 

Of course, a theorem saying that a condition is true of the vast majority of systems 
does not prove anything about a concrete given system; if we want to know for sure 
whether a given system is normal for every initial wave function, we need to check the 
relevant condition, which is ( TT7|) above. Nevertheless, a typicality theorem is, as we have 
suggested, illuminating; at the very least, it is certainly useful to know which behaviour 
is typical and which is exceptional. Note also that the terminology of calling a system 
"typical" or "atypical" might easily lead us to wrongly conclude that an atypical system 
will not be normal. A given system may have some properties that are atypical and 
nevertheless satisfy the condition ( TT7|) implying that the system is normal for every 
initial wave function. 

The method of appeal to typicality belongs to a long tradition in physics, which 
includes also Wigner's work on random matrices of the 1950s. In the words of Wigner 

One [. . . ] deals with a specific system, with its proper (though in many 
cases unknown) Hamiltonian, yet pretends that one deals with a multitude 
of systems, all with their own Hamiltonians, and averages over the properties 
of these systems. Evidently, such a procedure can be meaningful only if it 
turns out that the properties in which one is interested are the same for the 
vast majority of the admissible Hamiltonians. 

This method was used by Wigner to obtain specific new and surprising predictions about 
detailed properties of complex quantum systems in nuclear physics. 

If we know of a given system that its Hamiltonian H belongs to a particular small 
subset So of the set S of all self-adjoint operators on the appropriate Hilbert space, then 
two kinds of typicality theorems are of interest: one saying that the relevant behavior 
occurs for most H G So, the other saying that it occurs for most H G S. Note that the 
former does not follow from the latter when So is very small compared to S, as it would 
then be consistent with the latter for So to consist exclusively of exceptional Hs. Nor 
does the latter follow from the former, so the two statements are logically independent. 
In fact, both are of interest because each statement has its merits: The typicality theo- 
rem about So gives us more certainty that the given system, whose Hamiltonian belongs 
to 5*0, will behave in the relevant way. The typicality theorem about S gives us a deeper 
understanding of why the relevant behavior occurs, as it indicates that the behavior 
has not much to do with So but is widespread all over S. That is, there is a reciprocal 
relation: The greater the degree of certainty that a typicality theorem confers, the less 
its explanatory power. 
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7 Von Neumann's Quantum //-Theorem 



In his proof of the QET, von Neumann describes parallel considerations that prove a 
second theorem that he calls the "quantum if-theorem." This concerns the long-time 
behavior of the quantity S that von Neumann defines in his equation (34) to be the 
entropy of a system with wave function ip: 

Sty) = _^||p^H 2 logi!^£, (62 ) 

where k is the Boltzmann constant and log denotes the natural logarithm. (This formula 
looks a bit simpler than von Neumann's (34) because we consider only ips that lie in a 
particular micro-canonical Hilbert space Jff.) Note that this definition is different from 
the one usually known as the von Neumann entropy, 

Svn(p) = -fctr(plogp) (63) 

for a system with density matrix p, which had been introduced by von Neumann two 
years earlier [70] . 

In Section [7711 we discuss the definition (1621) . As with thermal equilibrium, there is 
no consensus about the definition of entropy in quantum mechanics. In Section 17.21 we 
discuss the contents of von Neumann's "quantum if-theorem." 



7.1 Von Neumann's Definition of Entropy 

We begin by giving a brief overview of several approaches. We first recall Boltzmann's 
entropy definition [8J for a macroscopic classical system. Consider an energy shell V as 
defined in (|22p . partitioned into subsets T u corresponding to different macro-states v. 
A system with phase point IeT has Boltzmann entropy 

S B (X) = k log vol(r„) if and only if X 6 T u , (64) 

where vol(rV) is the phase space volume of IV; see also [201 E3 [38]. As a quantum me- 
chanical analog, consider a macroscopic quantum system, an energy shell Jt? as defined 
in Section 12.11 and an orthogonal decomposition into subspaces Jt? u C Jf? correspond- 
ing to different macro-states v, define the quantum Boltzmann entropy of a system with 
wave function ip e Jf? v by 

S<p{$) = k\ogd„. (65) 

We also denote this quantity by S^iy). (See also [38j for a discussion of this formula. 
A version of it, with d v the "number of elementary states," was used already by Einstein 
in 1914 PH Eq. (4a)].) 

While the Boltzmann entropy is based on the individualist view, there is a coun- 
terpart in the ensemblist view, known as the Gibbs entropy in classical mechanics; it is 
defined for a system described by the probability density p on phase space M. 6N to be 

S G (p) = ~k f dXp(X) logp(A) . (66) 
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Its quantum mechanical analog is the von Neumann entropy, defined for a system de- 
scribed by the density matrix p on J^totai by (1651) . In his QET article, von Neumann 
writes in Section 1.3: "The [expression (|63p ] for entropy [... is] not applicable here in 
the way [it was] intended, as [it was] computed from the perspective of an observer 
who can carry out all measurements that are possible in principle — i.e., regardless of 
whether they are macroscopic." We agree that (|63|) is not applicable to the macroscopic 
quantum system von Neumann considers in his QET article, but for a different reason. 
Von Neumann's reason is of an ensemblist, information-theoretic nature, supposing that 
the value of the entropy quantifies the (possible) knowledge of an observer. In an in- 
dividualist framework, where the system is regarded as being in a pure state, the von 
Neumann entropy (1631 is clearly inadequate because it always yields the value zero; 
more fundamentally, an individualist regards entropy not as measuring the spread of a 
probability distribution, but as measuring the size of a macro-state. However, note that 
for p = d^ 1 P u and if) G Jtf u , 

S qB ty) = S vN (p) ■ (67) 

Let us turn to a comparison between (162]) and fl65|) . Since, for a macroscopic system, 
the contribution k "Y^ v ||P I/ ?/'|| 2 log HP^H 2 is sufficiently small [16] , we have that 

Sty) « kJ2\\ p ^\\ 2 log 4- (68) 

V 

This quantity is just the weighted average of the S qB {y), with the weight of v given by 
the quantum- mechanical probability of v associated with if). 

We conclude from this relation that Sty) should better be regarded as a sort of 
mean entropy of the system, than as its entropy. For comparison, for a classical system 
whose macro-state is unknown and has probability p u to be u, we would not say that 
the quantity 

J^ATogvoltTV), (69) 

V 

is the entropy of the system, but we would say instead that the system's entropy 5* is 
random, that it equals fclogvo^r^) with probability p u , and that fl69|) is its expected 
value E5. Now, the quantum situation is not completely analogous because a quantum 
superposition of contributions from different J4? u is not the same as a statistical mixture 
of wave functions from different Jf? u , but the analogy is good enough to make us doubt 
the adequacy of von Neumann's definition ( 162]) . 

Moreover, there are situations in which S(if>) decreases, contrary to the second law 
of thermodynamics. If if) is a non-trivial superposition 

if) = j2 c ^» ( 7 °) 

V 

of macroscopically different quantum states if) v € M'y with = 1 (such as, e.g., 

a Schrodinger cat state) then a measurement of all macro-observables will yield, with 
probability \c u \ 2 , the macro-state v and the wave function if> u with Sty v ) = S q Bty u )- 
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Since S(ip), in the approximation fl68|) . is just the average of the random value S(ip u ), 
the S{ipv) can (and usually will) have significant probability (possibly 50% or even more) 
to lie below >S(V0- 

7.2 Von Neumann's Quantum i7-Theorem 

In the previous subsection, we have expressed reservations about the adequacy of von 
Neumann's definition S(ip) of the entropy of a macroscopic quantum system. Be that 
as it may, here is what von Neumann's quantum if-theorem asserts (roughly): For any 
H without resonances on M' , most orthogonal decompositions Si of are such that for 
every wave function ip , S(ip t ) will be close to its upper bound klogD for most times 
t. (The detailed statement in the QET article specifies a bound on how close, see von 
Neumann's Equations (70) and (79)-(84).) 

We would like to emphasize that this statement is really just a corollary of the QET. 
If there is a subspace J^f cq with d eq /D fa 1 then for any wave function ip in thermal 
equilibrium as defined in (H7|) . ( |68|) entails that 



Thus, the qualitative content of the above statement follows already if the system is 
in thermal equilibrium for most of the time, and thus follows from normal typicality. 
More generally, even if there is no equilibrium macro-state obeying d eq /D fa 1, the 
statement follows from fl68|) and the QET, together with the condition that the number 
n of macro-states v be not too large. (Indeed, for most t, according to the QET, then 



For those v with (say) d v > D/10 3 n, we have that \ogd u > logD — log(10 3 n) fa logD 
if n is not too large; furthermore, the sum of these d v s is at least 0.999 D because the 
sum of the remaining d u s is a sum of less than n terms, each of which is less than 
D/10 3 n, and thus is less than D/10 3 . Thus, the right hand side of (!72|) is at least 
0.999 k(logD — log(10 3 n)), which is approximately klogD.) 

More importantly, there are striking differences between von Neumann's quantum H- 
theorem and Boltzmann's classical if -theorem [8] (see also [251 EDI EH EE])- Boltzmann's 
-ff-theorem was originally formulated only for systems whose behavior is well described 
by the Boltzmann equation (i.e., dilute, weakly interacting gases) but can be understood 
in a more general sense as the assertion that for most initial phase points Xq 6 Tj, / 
r eq; the Boltzmann entropy S-B{X t ) increases, up to exceptions, monotonically in both 
time directions with \t\ until it reaches the maximal (equilibrium) value; the exceptions 
(entropy valleys) are either very short-lived and shallow, or infrequent. This statement 
is not a mathematical theorem but very plausible^ 

14 Von Neumann apparently did not think of this statement when thinking of the classical ii-theorem. 
He wrote in Section 0.6: "As in classical mechanics, also here there is no way that entropy could always 



S(ip) fa klogD. 



(71) 




(72) 



V 
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In contrast, what von Neumann's quantum if-theorem implies about the increase 
of S(i/j t ) (for a system without resonances in the Hamiltonian and a typical decompo- 
sition @) is much less: If, initially, S(ip ) is far below klogD then it will in both time 
directions sooner or later reach the maximal (equilibrium) value klogD, i.e., there are 
t + > and t- < such that S(ipt + ) ~ klogD m S(ipt_); moreover, S(ipt) will assume 
its maximal value for most t in the long run (in both time directions). However, no 
statement is implied about a largely monotone increase; for example, insofar as von 
Neumann's quantum if-theorem is concerned, S(ip t ) m &y first go down considerably 
before increasing to klogD. Likewise, no statement is made about the features of the 
entropy valleys between and t±; they could, perhaps even for most wave functions, be 
long-lived, deep, and frequent. 

A Precise Statement of von Neumann's Quantum 
Ergodic Theorem 

In von Neumann's article, the statement of the quantum ergodic theorem is distributed 
over many pages. It may therefore be helpful for the reader if we provide the exact 
statement of the QET. Readers interested in the exact statement may also wish to look 
at the two modified versions of the statement that we have described in [22]; although 
they are in some ways stronger, they also follow from von Neumann's proof. 

Definition 1. The system corresponding to a choice of Jff, ipo G §(Jf?), H, and 3) is 
e-5' -normal if and only if, for (1 — 5') -most t, 



for every real-linear combination A = ^2, v a v P v (i.e., for every self-adjoint operator A 
from the algebra generated by the macro- observables Mi, . . . , M^). 

The condition (173)) is more or less equivalent, when the number n of macro-spaces 
Ml in M 3 is much smaller than each of the D/d v , to the condition that 



which is a precise version of (|T4j) . See Section 4 of [22J for more details about the relation 
between (1731 and (lT4"j). 



increase, or even have a predominantly positive sign of its [time] derivative (or difference quotient): the 
time reversal objection as well as the recurrence objection are valid in quantum mechanics as well as 
in classical mechanics." In fact, the above statement of the ii-theorem conveys something that could 
be called a "predominantly positive sign" of the time difference quotient of S = 5b (^t) but is not 
refuted by either time reversal or recurrence. Put very succinctly, this is because, according to the 
statement, S increases in both time directions and may well decrease after reaching its maximum; see, 
e.g., [2011311 HI] for further discussion. 




(73) 




for all v , 



(74) 
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Theorem 1. (von Neumann's 1929 QET) Let e > 0, 5 > 0, and 5' > 0. For arbitrary 
Jrf? of finite dimension, any orthogonal decomposition *3) = {J^ u } of Jtf, and any H 
without resonances, if 

maxl (0^^10,3) I 2 + maxf(0 Q |P i ,|0 a ) - < E 2 ^- — for all v (75) 

with d v = dim J4? v , D = dimJtf, and n = then for every ip G §(Jf 7 ) the system is 
e-b' -normal. Moreover, suppose we are given natural numbers D, n, and d%, . . . , d n such 
that d\ + . . . + d n — D and, for all v, 

maxjd, log D < d u < D/C x , (76) 

where C\ is a universal constant. Then (1 — 5) -most orthogonal decompositions 3) = 
of Jfl? with dim^, = d v are such that (!75|) is satisfied (and thus the system is 
e-5' -normal for every ip G §(^f ) ). 

One of the stronger versions of the QET described in [22] asserts that, under some- 
what stronger assumptions on the d u s, ( I75|) and (ITlj) can be replaced with the stronger 
(and more natural) error bound 

\PM\ 2 -j 5 \<ejj for all v. (77) 



B Von Neumann's Notation and Terminology 

The following table lists some of von Neumann's notation, and the (different) notation 



we use here. 

vN here meaning 

A, B, . . . A,B,... operators on Hilbert space 

(0, ip) ("010) inner product in Hilbert space 

H H Hamiltonian operator 

fp,a, W P} a 4> a , E a eigenfunction, eigenvalue of H 

P^ IV'XV'I projection to 1-d subspace spanned by ip 

U p density operator 

= Wa, ^a+i) a-th energy interval 

fflj a = M 3 a-th energy shell 

A a projection to the a-th energy shell 

S a D dimension of the a-th energy shell 

N a n number of macro-spaces in the a-th energy shell 

J^f v v-tYi macro-space (in the a-th energy shell) 

E v>a P u projection to J%1 

s U)a d u dimension of Jtf? v 

cox,v,a an orthonormal basis of J4? u 

M t {f (t)} J{tj time average of f(t) 

9Jt E ensemble average 
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Since for us, but not for von Neumann, the index a is fixed throughout, it is usually 
omitted in our notation. 

When von Neumann spoke of an orthogonal system in Hilbert space in the QET 
article, he meant an orthonormaZ system; this becomes clear from his Equation (20) in 
Footnote 25, where he uses P^f = (f,ip)tp, which is true only when \\ip\\ = 1, for an 
element ip of an "orthogonal system." 

Von Neumann used the expression macroscopic observer in the QET article when 
referring to the family <3) = {J4f u } of macro-spaces, or, equivalently, to the family 
{Mi, . . . , Mi\ of commuting observables. This is perhaps not a fitting terminology, as it 
may suggest that different people would have different sets like {Mi, . . . , M^} associated 
with them, which is not the case. It is perhaps even an unfortunate terminology, as it 
may further suggest that the (uniform) distribution over all ^s that the QET represents 
the distribution of different people's S> in some population of observers. Instead, this 
distribution should be regarded as just the mathematical means for expressing what is 
true of most orthogonal decompositions S>. 

Von Neumann used the expression energy surface in quantum mechanics in different 
sections of his article with different meanings. The main meaning, used in his Section 
1, is what we call an energy shell: The subspace ffl corresponding to a narrow interval 
[S'a, S a+ i) of energies. (Sometimes, he used "energy surface" when referring to the corre- 
sponding micro-canonical density matrix p mc , which is 1/D times the projection to Jff, 
or, in von Neumann's notation, (l/5' a )A a .) In Section 0.4, he used "energy surface" for 
the torus fl36|) with fixed radii r a > 0. 

The expression micro- canonical ensemble undergoes a similar change in meaning: 
While in Section 0.4, it refers to the uniform distribution over the torus (156]) . its main 
meaning, used in Section 1, is what we call the micro-canonical density matrix p mc . How- 
ever, besides that, von Neumann also used that expression for another density matrix 
LI,/,, associated with a given wave function ip e ^totai with contributions from several dif- 
ferent energy shells: it is a mixture of the projections to the different energy shells M'j a 
with weights provided by ip. In his words (after his Equation (32) in Section 1.3): "Now 
we are ready to define the micro-canonical ensemble pertaining to the state ip. [... W]e 
define it to be the mixture of the g- Ai, A 2 , . . . with weights (Ai?/>, ip), (A 2 V', V0> ■ ■ ■■" 
This further complication vanishes when one focuses, as one can without damage, on 
wave functions from one particular energy shell. 
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